class: center, middle, inverse, title-slide .title[ # Sampling in Action: The M&M Challenge ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Sampling in Action: The M&M Challenge --- ## Roadmap - M&M Sampling Activity - Analysis and Discussion - Advanced Sampling Concepts --- # What is Sampling? .pull-left[ - Selecting a subset of a population - Used to estimate characteristics of the whole - Critical in research and statistics ] .pull-right[ ] --- ## M&M Sampling Activity - Objective: Demonstrate sampling principles using M&M's - Hands-on experience with data collection and analysis -- - Materials: - Small packages of plain M&M's (one per student) - Napkins for sorting --- ## M&M Sampling Procedure *Steps in the activity* - Distribute M&M packages and materials -- - Sort M&M's by color on napkins -- - Record frequency of each color -- - Calculate percentages for each color -- - Hypothesize population color distribution -- - Form pairs to pool data -- - Pool data for entire class - using google sheets (and some R magic) --- # Data Collection .pull-left[ .center[ <img src="data:image/png;base64,#sampling_files/figure-html/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> Scan to input your data! .footnote[https://docs.google.com/spreadsheets/d/1D4i8e0pTrqwLk_FjMFtkimqhmtOrBf-X6OU9RT57m_Q] ]] -- .pull-right[ - Distribute M&M packages and materials - Sort M&M's by color on napkins - Record frequency of each color - Calculate percentages for each color - Hypothesize population color distribution - Form pairs to pool data - Pool data for entire class ] --- # Analysis in Action - What we'll get from the class data <img src="data:image/png;base64,#sampling_files/figure-html/unnamed-chunk-4-1.png" width="65%" style="display: block; margin: auto;" /> --- # Source Code .tiny[ ```r set.seed(123) # For reproducibility # Define the number of students and colors students <- c("Tukey", "Gauss", "Noether", "Fisher", "Bayes", "Pearson", "Student", "Fiducial", "Neyman", "Cochran") colors <- c("Blue", "Brown", "Green", "Red", "Yellow") # Simulate the total number of M&Ms for each student total_mms <- sample(15:20, length(students), replace = TRUE) # Simulate the counts of each color for each student color_counts <- replicate(length(colors), sample(1:total_mms, length(students), replace = TRUE)) # Create the dataframe df_syn <- data.frame(Name = students, color_counts) colnames(df_syn)[-1] <- colors # Calculate the percentages df_syn <- df_syn %>% mutate(Total = rowSums(across(Blue:Yellow))) %>% mutate(Blue_perc = Blue / Total * 100, Brown_perc = Brown / Total * 100, Green_perc = Green / Total * 100, Red_perc = Red / Total * 100, Yellow_perc = Yellow / Total * 100) # Reshape the data to long format df_long_syn <- df_syn %>% pivot_longer(cols = c(Blue_perc, Brown_perc, Green_perc, Red_perc, Yellow_perc), names_to = "Color", values_to = "Percentage") # Plotting the data stacked_plot <- df_long_syn %>% ggplot(aes(x = Name, y = Percentage, fill = Color)) + geom_col(position = "stack") + labs(title = "M&M Color Distribution by Student", x = "Student", y = "Percentage") + scale_fill_manual(values = c("Blue_perc" = "blue", "Brown_perc" = "brown", "Green_perc" = "green", "Red_perc" = "red", "Yellow_perc" = "yellow"), labels = c("Blue_perc" = "Blue", "Brown_perc" = "Brown", "Green_perc" = "Green", "Red_perc" = "Red", "Yellow_perc" = "Yellow")) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Overall distribution of M&Ms overall_distribution <- df_syn %>% select(Blue, Brown, Green, Red, Yellow) %>% summarise(across(everything(), sum)) %>% pivot_longer(cols = everything(), names_to = "Color", values_to = "Count") overall_plot <- overall_distribution %>% ggplot(aes(x = Color, y = Count, fill = Color)) + geom_col() + labs(title = "Overall M&M Color Distribution", x = "Color", y = "Total Count") + scale_fill_manual(values = c("Blue" = "blue", "Brown" = "brown", "Green" = "green", "Red" = "red", "Yellow" = "yellow")) + theme_minimal() stacked_plot ``` ```r # Display both plots library(gridExtra) #grid.arrange(stacked_plot, overall_plot, ncol = 2) ``` ] --- ## Analysis in action <img src="data:image/png;base64,#sampling_files/figure-html/unnamed-chunk-5-1.png" width="65%" style="display: block; margin: auto;" /> --- # Sample Size Effects .pull-left-narrow[ <table class="table table-striped table-hover table-condensed" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample.Size </th> <th style="text-align:left;"> Accuracy </th> <th style="text-align:left;"> Variability </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> Individual </td> <td style="text-align:left;"> Low </td> <td style="text-align:left;"> High </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Paired </td> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> Medium </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> Class-wide </td> <td style="text-align:left;"> High </td> <td style="text-align:left;"> Low </td> </tr> </tbody> </table> ] -- .pull-right-wide[ <img src="data:image/png;base64,#sampling_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: middle # Advanced Sampling Concepts --- ## Relating to Sampling Methods .pull-left[ - Simple random sampling - Each M&M package as a random sample - Stratified sampling - If we sorted M&M bags by production date, - Could this improve representativeness? ] -- .pull-right[ - Cluster sampling - If we sampled entire boxes of M&M packages - Potential production batch effects? - Systematic sampling - If we selected every nth M&M package from production line - Could introduce cyclical biases? ] --- # Potential Biases in M&M Sampling .pull-left[ - Production process biases - Color distribution variations between factories - Akin to sampling frame bias in surveys - Selection bias - If students choose their favorite color of package - Akin to non-random sample selection in research ] .pull-right[ - Measurement bias - Errors in counting or recording M&M colors - Akin to survey response errors - Non-response bias - If some students don't participate or eat their M&M's - Akin to survey non-respondents ] --- # Importance of Representative Samples .pull-left[ - What if we only sampled from one factory? - Implications for psychological research - Generalizing from sample to population - External validity of research findings ] -- .pull-right[ - Strategies for improving representativeness - Increasing sample size (more M&M packages) - Diversifying sample sources (different stores, batches) - Random selection procedures - Weighting techniques for unequal probability samples ] --- # Wrapping Up... .pull-left[ ## Key Takeaways 1. Sampling is required for understanding populations 2. Larger samples generally provide better estimates 3. Be aware of potential biases in sampling 4. Always aim for representative samples ] .pull-right[  ]